Multi-target Extractor and Detector for Unknown-number Speaker Diarization
نویسندگان
چکیده
Strong representations of target speakers can help extract important information about and detect corresponding temporal regions in multi-speaker conversations. In this study, we propose a neural architecture that simultaneously extracts speaker consistent with the diarization objective detects presence each on frame-by-frame basis regardless number conversation. A representation (called z-vector) extractor time-speaker contextualizer, implemented by residual network processing data both dimensions, are integrated into unified framework. Tests CALLHOME corpus show our model outperforms most methods proposed so far. Evaluations more challenging case simultaneous ranging from 2 to 7 achieves 6.4% 30.9% relative error rate reductions over several typical baselines.
منابع مشابه
Multi-stage Speaker Diarization for Conference and Lecture Meetings
The LIMSI RT-07S speaker diarization system for the conference and lecture meetings is presented in this paper. This system builds upon the RT06S diarization system designed for lecture data. The baseline system combines agglomerative clustering based on Bayesian information criterion (BIC) with a second clustering using state-of-the-art speaker identification (SID) techniques. Since the baseli...
متن کاملMulti-stream speaker diarization systems for the meetings domain
In the context of speech and speaker recognition systems, it is well known that the combination of different feature streams can improve significantly their performance. However, the application of multi-stream (MS) techniques to speaker diarization systems has not been extensively studied. In this paper, we address this issue: we formulate different MS techniques, such as feature combination, ...
متن کاملComparing Multi-Stage Approaches for Cross-Show Speaker Diarization
Acoustic speaker diarization is investigated for situations where a collection of shows from the same source needs to be processed. In this case, the same speaker should receive the same label across all shows. We compare different architectures for cross-show speaker diarization: the obvious concatenation of all shows, a hybrid system combining first a local clustering stage followed by a glob...
متن کاملUnsupervised Methods for Speaker Diarization
Given a stream of unlabeled audio data, speaker diarization is the process of determining “who spoke when.” We propose a novel approach to solving this problem by taking advantage of the effectiveness of factor analysis as a front-end for extracting speaker-specific features and exploiting the inherent variabilities in the data through the use of unsupervised methods. Upon initial evaluation, o...
متن کاملIntegrating online i-vector extractor with information bottleneck based speaker diarization system
Conventional approaches to speaker diarization use short-term features such as Mel Frequency Cepstral Co-efficients (MFCC). Features such as i-vectors have been used on longer segments (minimum 2.5 seconds of speech). Using i-vectors for speaker diarization has been shown to be beneficial as it models speaker information explicitly. In this paper, the i-vector modelling technique is adapted to ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Signal Processing Letters
سال: 2023
ISSN: ['1558-2361', '1070-9908']
DOI: https://doi.org/10.1109/lsp.2023.3279781